2 research outputs found
YUVMultiNet: Real-time YUV multi-task CNN for autonomous driving
In this paper, we propose a multi-task convolutional neural network (CNN)
architecture optimized for a low power automotive grade SoC. We introduce a
network based on a unified architecture where the encoder is shared among the
two tasks namely detection and segmentation. The pro-posed network runs at
25FPS for 1280x800 resolution. We briefly discuss the methods used to optimize
the network architecture such as using native YUV image directly, optimization
of layers & feature maps and applying quantization. We also focus on memory
bandwidth in our design as convolutions are data intensives and most SOCs are
bandwidth bottlenecked. We then demonstrate the efficiency of our proposed
network for a dedicated CNN accelerators presenting the key performance
indicators (KPI) for the detection and segmentation tasks obtained from the
hardware execution and the corresponding run-time.Comment: This paper is accepted for CVPR workshop dem
Design of Real-time Semantic Segmentation Decoder for Automated Driving
Semantic segmentation remains a computationally intensive algorithm for
embedded deployment even with the rapid growth of computation power. Thus
efficient network design is a critical aspect especially for applications like
automated driving which requires real-time performance. Recently, there has
been a lot of research on designing efficient encoders that are mostly task
agnostic. Unlike image classification and bounding box object detection tasks,
decoders are computationally expensive as well for semantic segmentation task.
In this work, we focus on efficient design of the segmentation decoder and
assume that an efficient encoder is already designed to provide shared features
for a multi-task learning system. We design a novel efficient non-bottleneck
layer and a family of decoders which fit into a small run-time budget using
VGG10 as efficient encoder. We demonstrate in our dataset that experimentation
with various design choices led to an improvement of 10\% from a baseline
performance.Comment: Accepted at VISAPP 201